Two-Terminal Lithium-Mediated Artificial Synapses with Enhanced Weight Modulation for Feasible Hardware Neural Networks

Highlights The Li-mediated artificial synapses with a vertical two-terminal configuration capable of various synaptic behaviors, including bio-plausible synaptic plasticity, were successfully demonstrated for the first time and thoroughly explored Synaptic characteristics based on the progressive dearth of Li in LixCo2 films are precisely controlled over the weight control spike, achieving extraordinary weight control functionality. In artificial neural network simulation, LixCoO2-based neuromorphic system showed excellent accuracy comparable to the theoretical maximum due to the low nonlinearity and programming error, suggesting feasibility of hardware neural network implementation. Supplementary Information The online version contains supplementary material available at 10.1007/s40820-023-01035-3.


Supplementary Figures
As the temperature rises, the semiconductor LixCoO2 film becomes more conductive, and the diffusion of Li ions intensifies, causing the programming state to deteriorate more

S1 Nonlinearity Estimation of LTP/LTD Curves
In this paper, two methods were applied to estimate the nonlinearity of the LTP/LTD curves.

S1.1 Nonlinearity (β)
The nonlinearity (β) in the synaptic potentiation and depression can be derived from the weight update relations below, which are widely used in most synaptic devices [S1-S7].
Here, subscripts P and D denote weight updates in potentiation and depression. Gk and Gk+1 indicate the conductance of the artificial synaptic device after the k th and (k+1) th weight control S4/S13 spikes are applied, respectively. Gmin and Gmax are the minimum and maximum device conductance. The internal variable α represents the step size during weight update and β is the nonlinearity. The ideal β is 0, and the smaller the value, the more linear the weight update.
Here, Gmin and Gmax are the minimum and maximum device conductance, respectively. The parameter α is a nonlinearity factor that determines the potentiation (αP) or depression (αD) behaviors. ω is an internal variable between 0 and 1 that increases (decreases) with the application of weight enhancing (weakening) spikes. For an ideal synaptic device with αP = αD =1, a perfectly linear and symmetric weight update occurs. The shapes of LTP/LTD curves are convex-up if α > 1, and concave-down if α < 1.  Figure S7 depicts the LTP/LTD curve programmed 2n times and device conductance states at the k th (orange dot) and (2n-k) th (green dot) weight updates. Symmetricity is defined as the reciprocal value of the symmetric error (symmetricity = symmetric error -1 ) [S3]. The symmetric error is expressed as follows:

S2 Symmetricity Determination in Weight Updates
Here, GN, Gmax, and Gmin signify the normalized, the maximum, and the minimum value of the device conductance, respectively. The complete asymmetry (symmetry error = ∞) and perfect symmetry (symmetry error = 0) cases of the LTP/LTD curves are illustrated in Fig. 7.

Fig. S7
Symmetricity determination in weight updates. a LTP/LTD curve programmed 2n times and device conductance states at the k th (orange dot) and (2n-k) th (green dot) weight updates. b The complete asymmetry (symmetry error = ∞) and perfect symmetry (symmetry error = 0) cases of the LTP/LTD curves

S3 Asymmetric Ratio Analysis of LTP/LTD Curves
The asymmetric ratio (AR) between the two successive LTP and LTD curves, each with n weight updates, is defined as where |GP(k)| and |GD(k)| indicate the average conductance values during potentiation and depression, respectively [S8,S9,S12]. The GP(n) and GD(n) represent the device conductance after programmed n times. The asymmetric ratio should be zero for an ideal case.

S4 CNN-Based Image Recognition Simulation
A convolutional neural network (CNN) is a class of deep learning algorithms specialized in processing pixel data for image recognition [S13]. Fig. S15a represents a schematic of image inference for ImageNet in ResNet50-v1.5, a 50-layer deep CNN model, carrying feature extraction and inference [S14, S15]. In the feature extraction step, hierarchical patterns with complexity are scaled down and assembled into simpler and smaller patterns embossed on the filter. The cumulated data processed stepwise is flattened in one dimension in the inference stage and fed into a fully connected layer. The input image is classified by returning the most probable result among 1000 classes of ImageNet via the activation function for the input weights. Fig. S15b depicts the diagram of data processing and algorithm architecture of the ResNet50-v1.5 model. The ResNet50-v1.5 model consists of one max pooling layer, one average pooling layer, and five convolutional stages comprised of several residual blocks which are the foundational building blocks of the ResNet architecture [S16]. The first residual block of each stage is a convolutional block that halves the input scale. Fig. S15c presents the topology of a residual block, also known as an identity block, with a skip connection [S17]. The identity function is used as a shortcut to permit gradients to propagate directly into a deeper layer in networks, bypassing non-linear activation functions. When the desired underlying mapping is defined as H(xl), the goal is to optimize the final equation H(xl), which is formulated as H(xl) = F(xl) + xl. Here, F(xl) denotes the form of the input xl after going through the convolutional layers, batch normalizations, and activation functions ReLU. At this time, since the xl is referenced as an input value, the optimization is only discussed on F(xl) = H(xl) -xl.
The F(xl) behaves like a residual, thus the name 'residual block.'

S5 MLPs-Based Image Recognition Simulation
The file type [S18], MNIST [S19], and fashion MNIST [S20] datasets were chosen for image recognition. The file type database is classified into 9 categories (AES-256, GZIP, ELF, DOC, PDF, GIF, JPG, PNG, and HTMP). Each data type can be identified by evaluating its performance across three input spaces: the byte probability distribution, the power spectral density, and the sliding window entropy of a sequence of bytes in the file [S18]. The large MNIST and fashion MNIST data sets consist of ten types of handwritten digits (0−9) and ten types of Zalando's article images (T-shirts, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots), respectively. In the image recognition to large MNIST and fashion MNIST, the 28 × 28 pixels of images correspond to the 28 × 28 pre-neurons and serve as an input layer in the multilayer perceptron. The monochrome images of MNIST and fashion MNIST were transformed to grayscale values ranging from 0 to 255 and supplied to the preneurons input. There is a single hidden layer with 300 hidden neurons between the input and output layers. The output layer has 10 output neurons, each corresponding to one of 10 types of images in the input data set. All neurons in each layer are fully connected to all neurons in the following layer via synapses. Each neural network was trained for 40 epochs, with each epoch exploring an optimal inferred model by training and testing on assigned training sets at random. The training was individually done on an allocated number of training sets of 100, 1,000, and 10,000 images. Forward propagation proceeds through the activation of neurons that transmits a signal from the previous neuron to the next according to the synaptic weight. As a nonlinear activation function, the sigmoid function controls the firing of neurons. The learning algorithm described above is programmed in Python.    Large MNIST (28, 28, 1) 10 CNN CNN6 v2 Four convolutional + two dense layers, 119.3K weights. [S19] CIFAR-10 (32, 32, 3) 10 ResNet56 Follows the architecture in Ref. [S17]. 861.8K weights. [S28] CIFAR-100 (32, 32, 3) 100 ResNet56 Follows the architecture in Ref. [S17] with 4× more channels (16×weights).